讽刺是一种越来越普遍的语言现象,并且由于其主观性,缺乏背景和深刻的意见而令人挑战。在多模式设置中,讽刺通过文本和视觉实体之间的不协调传达。虽然最近接近讽刺作为分类问题,但目前尚不清楚为什么在线帖子被识别为讽刺。如果没有正确的解释,最终用户可能无法察觉潜在的讽刺意义。在本文中,我们提出了一种新的问题 - 多式联运讽刺解释(Muse) - 鉴于包含图像和标题的多模式讽刺柱,我们旨在产生自然语言解释,以揭示预期的讽刺。为此,我们开发更多,一个新的数据集,具有3510个讽刺的多模式帖子的解释。每个解释都是一种描述隐藏的讽刺的自然语言(英语)句子。我们通过采用基于多模式变换器的架构来基准更多。它包含了变压器编码器中的跨模型注意,它出席了两个模式之间的区别。随后,将基于BART的自动回归解码器用作发电机。经验结果表明,在五项评估指标上展示了各种基线(用于缪斯采用)的结果。我们还对预测进行人类评估,并获得0.4的FARISS Kappa得分,作为25个评估员之间的公平协议。
translated by 谷歌翻译
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
translated by 谷歌翻译
A framework for creating and updating digital twins for dynamical systems from a library of physics-based functions is proposed. The sparse Bayesian machine learning is used to update and derive an interpretable expression for the digital twin. Two approaches for updating the digital twin are proposed. The first approach makes use of both the input and output information from a dynamical system, whereas the second approach utilizes output-only observations to update the digital twin. Both methods use a library of candidate functions representing certain physics to infer new perturbation terms in the existing digital twin model. In both cases, the resulting expressions of updated digital twins are identical, and in addition, the epistemic uncertainties are quantified. In the first approach, the regression problem is derived from a state-space model, whereas in the latter case, the output-only information is treated as a stochastic process. The concepts of It\^o calculus and Kramers-Moyal expansion are being utilized to derive the regression equation. The performance of the proposed approaches is demonstrated using highly nonlinear dynamical systems such as the crack-degradation problem. Numerical results demonstrated in this paper almost exactly identify the correct perturbation terms along with their associated parameters in the dynamical system. The probabilistic nature of the proposed approach also helps in quantifying the uncertainties associated with updated models. The proposed approaches provide an exact and explainable description of the perturbations in digital twin models, which can be directly used for better cyber-physical integration, long-term future predictions, degradation monitoring, and model-agnostic control.
translated by 谷歌翻译
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
translated by 谷歌翻译
Because of their close relationship with humans, non-human apes (chimpanzees, bonobos, gorillas, orangutans, and gibbons, including siamangs) are of great scientific interest. The goal of understanding their complex behavior would be greatly advanced by the ability to perform video-based pose tracking. Tracking, however, requires high-quality annotated datasets of ape photographs. Here we present OpenApePose, a new public dataset of 71,868 photographs, annotated with 16 body landmarks, of six ape species in naturalistic contexts. We show that a standard deep net (HRNet-W48) trained on ape photos can reliably track out-of-sample ape photos better than networks trained on monkeys (specifically, the OpenMonkeyPose dataset) and on humans (COCO) can. This trained network can track apes almost as well as the other networks can track their respective taxa, and models trained without one of the six ape species can track the held out species better than the monkey and human models can. Ultimately, the results of our analyses highlight the importance of large specialized databases for animal tracking systems and confirm the utility of our new ape database.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
Masked image modelling (e.g., Masked AutoEncoder) and contrastive learning (e.g., Momentum Contrast) have shown impressive performance on unsupervised visual representation learning. This work presents Masked Contrastive Representation Learning (MACRL) for self-supervised visual pre-training. In particular, MACRL leverages the effectiveness of both masked image modelling and contrastive learning. We adopt an asymmetric setting for the siamese network (i.e., encoder-decoder structure in both branches), where one branch with higher mask ratio and stronger data augmentation, while the other adopts weaker data corruptions. We optimize a contrastive learning objective based on the learned features from the encoder in both branches. Furthermore, we minimize the $L_1$ reconstruction loss according to the decoders' outputs. In our experiments, MACRL presents superior results on various vision benchmarks, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and two other ImageNet subsets. Our framework provides unified insights on self-supervised visual pre-training and future research.
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
预测经济的短期动态 - 对经济代理商决策过程的重要意见 - 经常在线性模型中使用滞后指标。这通常在正常时期就足够了,但在危机期间可能不足。本文旨在证明,在非线性机器学习方法的帮助下,非传统和及时的数据(例如零售和批发付款)可以为决策者提供复杂的模型,以准确地估算几乎实时的关键宏观经济指标。此外,我们提供了一组计量经济学工具,以减轻机器学习模型中的过度拟合和解释性挑战,以提高其政策使用的有效性。我们的模型具有付款数据,非线性方法和量身定制的交叉验证方法,有助于提高宏观经济的启示准确性高达40 \% - 在COVID-19期间的增长较高。我们观察到,付款数据对经济预测的贡献很小,在低和正常增长期间是线性的。但是,在强年或正增长期间,付款数据的贡献很大,不对称和非线性。
translated by 谷歌翻译